DegExt - A Language-Independent Graph-Based Keyphrase Extractor
نویسندگان
چکیده
In this paper, we introduce DegExt, a graph-based languageindependent keyphrase extractor,which extends the keyword extraction method described in [6]. We compare DegExt with two state-of-the-art approaches to keyphrase extraction: GenEx [11] and TextRank [8]. Our experiments on a collection of benchmark summaries show that DegExt outperforms TextRank and GenEx in terms of precision and area under curve (AUC) for summaries of 15 keyphrases or more at the expense of a non-significant decrease of recall and F-measure. Moreover, DegExt surpasses both GenEx and TextRank in terms of implementation simplicity and computational complexity.
منابع مشابه
DegExt: a language-independent keyphrase extractor
In this paper, we introduce DegExt, a graph-based languageindependent keyphrase extractor,which extends the keyword extraction method described in (Litvak & Last, 2008). We compare DegExt with two state-of-the-art approaches to keyphrase extraction: GenEx (Turney, 2000) and TextRank (Mihalcea & Tarau, 2004). We evaluated DegExt on collections of benchmark summaries in two different languages: E...
متن کاملNoun Compound and Named Entity Recognition and their Usability in Keyphrase Extraction
We investigate how the automatic identification of noun compounds and named entities can contribute to keyphrase extraction and we also show how previously identified noun compounds affect named entity recognition and vice versa, how noun compound detection is supported by identified named entities. Our experiments demonstrate that already known noun compounds yield better performance in named ...
متن کاملUsing Noun Phrase Heads to Extract Document Keyphrases
Automatically extracting keyphrases from documents is a task with many applications in information retrieval and natural language processing. Document retrieval can be biased towards documents containing relevant keyphrases; documents can be classified or categorized based on their keyphrases; automatic text summarization may extract sentences with high keyphrase scores. This paper describes a ...
متن کاملAdaptation of a Keyphrase Extractor for Japanese Text*
This paper presents some statistical observations relevant to Japanese keyphrase extraction, as well as the details of the implementation of a keyphrase extraction algorithm (called Extractor) for Japanese documents. Parts of the algorithm include an efficient method of extracting the keyphrase candidates, a way to pinpoint the most probable keyphrases using contextual information, a technique ...
متن کاملIdentifying important concepts from medical documents
Automated medical concept recognition is important for medical informatics such as medical document retrieval and text mining research. In this paper, we present a software tool called keyphrase identification program (KIP) for identifying topical concepts from medical documents. KIP combines two functions: noun phrase extraction and keyphrase identification. The former automatically extracts n...
متن کامل